Large-Scale Corpus-Driven PCFG Approximation of an HPSG
نویسندگان
چکیده
We present a novel corpus-driven approach towards grammar approximation for a linguistically deep Head-driven Phrase Structure Grammar. With an unlexicalized probabilistic context-free grammar obtained by Maximum Likelihood Estimate on a largescale automatically annotated corpus, we are able to achieve parsing accuracy higher than the original HPSG-based model. Different ways of enriching the annotations carried by the approximating PCFG are proposed and compared. Comparison to the state-of-the-art latent-variable PCFG shows that our approach is more suitable for the grammar approximation task where training data can be acquired automatically. The best approximating PCFG achieved ParsEval F1 accuracy of 84.13%. The high robustness of the PCFG suggests it is a viable way of achieving full coverage parsing with the hand-written deep linguistic grammars.
منابع مشابه
Robust Parsing, Meaning Composition, and Evaluation
In the larger context of parsing for semantic interpretation, we present and evaluate a novel approach to corpus-driven approximation of linguistically rich, constraint-based grammars. We obtain an unlexicalized probabilistic context-free grammar (PCFG) from a very large corpus that is automatically annotated with the fine-grained syntacto-semantic analyses of a broad-coverage Head-Driven Phras...
متن کاملParse Selection with a German HPSG Grammar
We report on some recent parse selection experiments carried out with GG, a large-scale HPSG grammar for German. Using a manually disambiguated treebank derived from the Verbmobil corpus, we achieve over 81% exact match accuracy compared to a 21.4% random baseline, corresponding to an error reduction rate of 3.8.
متن کاملEfficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing
We investigated the performance efficacy of beam search parsing and deep parsing techniques in probabilistic HPSG parsing using the Penn treebank. We first tested the beam thresholding and iterative parsing developed for PCFG parsing with an HPSG. Next, we tested three techniques originally developed for deep parsing: quick check, large constituent inhibition, and hybrid parsing with a CFG chun...
متن کاملA Debug Tool for Practical Grammar Development
We have developed willex, a tool that helps grammar developers to work efficiently by using annotated corpora and recording parsing errors. Willex has two major new functions. First, it decreases ambiguity of the parsing results by comparing them to an annotated corpus and removing wrong partial results both automatically and manually. Second, willex accumulates parsing errors as data for the d...
متن کاملAn HPSG-to-CFG Approximation of Japanese
We present a simple approximation method for turning a Head-Driven Phrase Structure Grammar into a context-free grammar. The approximation method can be seen as the construction of the least xpoint of a certain monotonic function. We discuss an experiment with a large HPSG for Japanese.
متن کامل